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ABSTRACT 

A "best evidence" review synthesis, which 
incorporates features of meta-analytic and traditional literature 
reviews, is used in this review of studies on the effects of ability 
grouping on secondary school students • achievement. The focus was on 
29 studies that compared between-class ability grouping to 
heterogeneous placements. Effect sizes were used to characterize 
study results. Findings indicate that comprehensive between-class 
ability grouping plans, different forms of ability grouping, and 
ability grouping by subject (except in social studies) had no effect 
on student achievement. The finding of zero effects of grouping for 
all ability levels contradicts earlier conclusions that demonstrated 
benefits of ability grouping for high-level students and detriments 
for low-level students. Explanations for this discrepancy are 
discussed. An implication is that policy decisions about ability 
grouping must be based on criteria other than effect on academic 
achievement, I recommendation is made for reduction of between class 
ability grouping practices and consideration of cooperative learning 
methods. An extensive bibliography and statistical tables are 
included, (LMI) 
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ACHIEVEMENT EFFi:CTS OF ABILITY GROUPING IN SECONDARY SCHOOLS: 

A BEST-EVIDENCE SYNTHESIS 



EXECUTIVE SUMMARY 

This article reviews research on the effects of cbility grouping on the achievement of secondary 
school students. Tlie principal focus of the review was on studies which compared between-clast 
ability grouping to heterogeneous placements. Six randomized experiments, 9 matched experiments, 
and 14 correlational studies made this comparison. Across twenty studies from which effect sizes 
would be computed, the median effect size was essentially zero (ES = -.02), and no differences 
were found in the remaining nine studies. Effect sizes were also near zero for high achievers (ES 
= +.01), average achievers (ES = -.08), and low achievers (ES = -.02). While most studies 
involved grades 7-9, senior high schooi studies did not produce results different from those 
involving junior high schools. Effects were similar in all subjects except social studies, where 
heterogeneous placement was usually superior to ability grouping. A very small set of studies of 
forms of grouping other than typical between-class plans (e.g., within-class grouping, flexible 
grouping, Joplin Plan) failed to find positive effects of these methods. 

The Onding of zero effects of grouping for all ability levels contradicts earlier findings from studies 
comparing students in high, average, and low ability groups which had suggested that ability 
grouping was beneficial to students in high groups and detrimental to those in low groups. Several 
explanations are advanced to account for this discrepancy. 

The report concludes with a recommendation that, in the absence of any evidence of instructional 
effectiveness, secondary schools should reduce their use of between-class ability grouping. 
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ACHIEVEMENT EFFECTS OF ABILITY GROUPING !N SECONBAJIY SCHOOLS: 

A BEST-EVIDENCE SYNTHESIS 

For more than seventy years, ability grouping has been one of the most controversial issues in 
education. Its effects, particularly on student achievement, have been extensively studied over 
that time period, and many .eviews of the Uterature have been written. In recent years, a 
comprehensive review of the achievement effects of abiUty grouping in elementary schools was 
published by Slavin (1987), but only brief meta-analj'ses by Kulik and Kulik (1982, 1987) nave 
reviewed the evidence on ability grouping and heterogeneous placement in secondary schools. 

The purpose of this paper is to present a comprehensive review of all research published in English 
which evaluated the effects of ability grouping on student achievement in secondary schools. 
"Secondary schools" are defintxi here as middle, junior, or senior high schools in the U.S., or 
similarly configured secondary ixhools in other countries. Secondary schools can include grades as 
low as five, but they usually be.gin with sixth or seventh grades. Ability grouping is defined as any 
school or classroom organization plan which is intended to reduce the heterogeneity of instructional 
groups; between-class ability grouping reduces the heterogeneity of each class for a given subject 
and within-class ':faility groupin.i reduces the heterogeneity of groups within the class (e.g., reading 
groups). 

Unlike the situation in elementary schools, ability grouping in secondary schools is overwhehningly 
between-class grouping (McPartland, Coldiron, & Braddock, 1987). Several closely related forms 
of ability grouping are used. Sometimes students are assigned to a track within which all courses 
are taken, based on some combination of composite achievement, IQ, and teacher judgments. For 
example, senior high school students are often assign ?d to academJc, general, and vocational tracks; 
middle/ junior high school students are often assigned to advanced, basic, and remedial tracks (in 
either case, the number of tracks and the names used to describe them vary widely). This type of 
grouping plan is generally called tracking in the U.S. or streaming in Europe. It is an example 
of what Slavin (1987) called "ability-grouped class assignment." In addition to assignment to higher 
and lower sections of the same courses, tracking in senior high schools usually also involves 
different courses or course requirements. For example, a student in the academic track may have 
to take more years of mathematics than a student in the general track, or may take French III 
rather than metal shop. 

A particular form of tracking often seen in middle^unior high schools is block scheduling, where 
students spend all or most of the day with one homogeneous group of students. Some schools 
rank-order students from top to bottom and assign them to, say, 7-1, 7-2, 7-3, and so on. Many 
senior high schools allow students to choose their track or to choose the level they wish to take 
in each subject, but in plans of this kind counselors tend to steer students into the level of classes 
to which they would have been assigned if the school were not allo\^':ng students a choice 
(Rosenbaum, 1978). 

Another form of ability grouping common in secondary schools involves assigning students to 
ability-grouped classes for al! academic subjects, but allows for the possibility that students will be 
placed in a high-ranking group for one subject and a low-ranking group for another. In practice, 
scheduling constraints often make this type of grouping similar to plans in which all courses are 



taken within the same track. In some cases schools group by abih'ty for some subjects and not for 
others; for example, students may be in abi!it>'.grouped math and English classes but in 
heterogeneous social studies and science classes. Ability grouping usually involves higher and lower 
sections of the same course, but sometimes consists of assignment to completely different courser*, 
as when ninth graders are assigned either to Algebra I or to general math. When high achievers 
are assigned to markedly different courses usually offered to older students (as when seventh 
graders take algebra), this is called acceleration. More commonly, high achievers may be assigned 
to Tionors" or "advanced placement" sections of a given course, while low achievers may be 
assigned to special "remedial" sections. 

While between-class ability grouping is by far the most common type of ability g/ouping in 
secondary schools, forms of within-<:lass grouping are also occasionally seen. These are plans in 
which students are assigned to homogeneous instructional groups within their classes. Within-class 
ability grouping, such as use of reading or math groups, is the most common form of grouping at 
the elementary level (McPartland et aL, 1987). Complex plans, such as plans that involve grouping 
across grade lines, flexible grouping for particular topics, and part-time grouping, are also 
occasionally seen in secondary schools. In general, a wider range of grouping plans are used in 
middle/ junior high schools than in senior high schools. 

Arguments for and against ability grouping have been essentially similar for seventy years. For 
example, Tumey (1931), summarizing writings of the 1920s, listed the following advantages and 
disadvantages: 

Advantages (according to Turney, 1931) 

1. It permits pupils to make progress commensurate with their abilities. 

2. It makes possible an adaption of the technique of instruction to the needs of the group. 

3. It reduces failures. 

4. It helps to maintain interest and incentive, because bright students are not borea by the 
participation of the dull. 

5. Slower pupils participate more when not eclipsed by those much brighter. 

6. It makes teaching easier. 

7. It makes possible individual instruction to small slow groups. 
Disadvantages (According to Tumey, 1931). 

1. Slow pupils need the presence of the able students to stimulate them and encourage them, 

2. A stigma is attached to low sections, operating to discourage the pupils in these sections. 
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3. Teachers are unable, or do not have time, to differentiate the wr.rk for different levels of 
ability. 

4. Teachers object to the slower groups. 

A research symposium, school board meeting, or PTA meeting on the topi: of ability grouping in 
1990 is likely to bring up much the same arguments on both sides, with two important additions: 
the argument that ability grouping discriminates against minority and lower-class students (e.g., 
Braddock, 1989; Rosenbaum, 1976), and the argument that the low tracks reoeive a lower pace and 
lower quality of instruction than do students in the higher tracks (e.g, Gamoran, 1989; Oakes, 
1985), 

In essence, the argument in favor of ability grouping is that grouping will allow teachers to adapt 
instruction to the needs of a diverse student body, with an opportunity to provide more difficult 
material to high achievers and more support to low achievers. For high achievers, the challenge 
and stimulation of other high achievers is felt to be beneficial (see Feldhusen, 1989). Arguments 
opposed to ability grouping focus primarily on the perceived damage to low achievers, who 
experience a slower pace and lower quality of instruction; teachers who are less experienced or able 
and who do not want to teach low-track classes; low expectations for performance; and few positive 
behavioral models (e.g., Gamoran, 1989; Oakes, 1985; Persell, 1977; Rosenbaum, 1980), Because 
of the demoralization, low expectations, and poor behavioral models, students in the low tracks are 
felt to be more prone to delinquency, absenteeism, dropping out, and other social problems 
(Crespo & Michelna, 1931; Wiatrowski, Hansell, Massey, & Wilson, 1982). With few college- 
bound peers, studenb in the low tracks are found to be less likely to attend college than other 
students (Gamoran, 1987). Ability grouping is perceived to perpetuate social class and racial 
inequities because lower-class and minority students are disproportionally represented in the lower 
tracks. Ability grouping is often considered to be a major factor in the development of elite and 
underclass groups in society (Persell, 1977; Rosenbaum, 1980). Perhaps most importantly, tracking 
is felt to work against egalitarian, democratic ideals by sorting students into categories from which 
escape is difficult or impossible. 

There are important differences between the pro-grouping and anti-grouping positions that go 
beyond the arguments themselves. Arguments in favor of ability grouping focus on effectiveness, 
saying in effect that as distasteful as grouping may be, it so enhances the learning of students 
(particularly but not only high achievers) that its use is necessary. In contrast, arguments opposed 
to grouping focus at least as much on equity as on effectiveness, on democratic values as much as 
cn outcomes. In one sense, then, the burden of proof is on those who favor grouping, for if 
grouping is not found to be clearly more effective than heterogeneous placement, none of the 
pro-grouping arguments a^ply. The san:e is not true of anti-grouping arguments, which provide 
a rationale for abolishing grouping that would be plausible e\"en if grouping were found to have 
no adverse effect on achievement. 

Research on the achievement effects of ability grouping has taken two broad forms. One type 
of research compares the anhieven^ent gains of students who were in one or another form of 
grouping to those of students in ungroaped, heterogeneous placements. Another type of research 
compares the achievement gains made by students in high-ability groups to those made by students 
in the low groups. 
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Reviews of ihe grouping vs. non-grouping literature have consistently found ability grouping to 
have little or no impact on student achievement overall in elementary and secondary schools (e.g., 
Borg, 1965; Esposito, 1973; Fmdiey & Bryan, 1971; Good & Marshall, 1984; Heathers, 1969; Kulik 
& Kulik, 1982). Based primarily on his own empirical research, Borg (1965) claimed that ability 
grouping had a slight positive effect on the achievement of high achievers and a slight negative 
effect on low achievers, but Kulik and Kulik (1987) found no such trend. 

In contrast, researchers who have compared gains made by students in different tracks have 
generally concluded that when ability level, socioeconomic status, and other factors are controlled, 
high-track assignment accelerates achievement while low-track assignment significantly reduces 
achievement (Alexander, Cook, & McDill, 1978; Dar & Resh, 1986; Gamoran & Berends, 1987; 
Gamoran & Mare, 1989; Oakes, 1982; PerseU, 1977; Sorensen & Hallinan, 1985). In fact, many 
researchers and theorists in the sociological tradition maintain that tracking is a principal engine 
of social inequality in society and that it causes or greatly magnifies differences along lines of class 
and ethnicity (e.g., Braddock, 1990; Jones, Erickson, & Crowell, 1972; Schafer & Olexa, 1971; 
Vanfossen, Jones, & Spade, 1987). 

One area of research has investigated the quality of instruction offered to students in high- and 
low-ability groups, usually concluding that low-ability group classes receive a quality of instruction 
that is significantly lower than that received by students in high-track classes (e.g., Evertson, 1982; 
Gamoran, 1989; Oakes, 1985; Trimble & Sinclair, 1987). However, it is difficult to compare "quality 
of instruction" in high- and low-track classes. For example, teachers typically cover less material 
in a low-track class (e.g., Oakes, 1985). Is this an indication of poor quality of instruction or an 
appropriate pace of instruction? Students in low-track classes are more off-task than those in 
high-track classes (e.g., Evertson, 1982). Is this due to the poor behavioral models and low 
expectations in the low-track classes, or would low achievers be more off-task than high achievers 
in any grouping arrangement? However, evidence that low-track classes are often taught by less 
experienced or less qualified teachers or that they manifest other indicators of lower-quality 
instruction could justify the conclusion that regardless of measurable effects on learning, students 
in the lower tracks do not receive equal treatment. 

In addition to synthesizing research on overall effects of ability grouping on the achievement of 
high-average- and low- achieving secondary students, this review will attempt to reconcile research 
comparing achievement gains in different tracks with research comparing grouped and ungrouped 
settings. 

Review Methods 

This article uses a review procedure called T>est-evidence synthesis" (Slavin, 1986), which 
incorporates the best features of meta-analytic and traditional reviews. Best-evidence syntheses 
specify clear, well-justified methodological and substantive criteria for mclusion of studies in the 
main review and describe individual studies and critical research issues in the depth typical of 
good-quality narrative reviews. However, whenever possible, effect sizes are used to characterize 
study outcomes, as in meta-analyses (Glass, McGaw, & Smith, 1981). Systematic literature search 
procedures, also characteristic of meta-analysis, are similarly applied in best-evidence syntheses. 
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Criteria for Study Inclusion 



The studies on which this review is based had to meet a set of a priori criteria with respect to 
relevance to the topic and methodolo'^'cal adequacy. First, all studies had to involve comprehensive 
ability grouping plans, which incorporated most or all students in the school. This excludes studies 
of special programs for the gifted or other high achievers as well as studies of special education, 
remedial programs, or other special programs for low achievers. Studies of within-class ability 
grouping are included, but studies of such grouping-related programs as individualized instruction, 
mastery learning, cooperative learning, and continuous-progress groupings ar'* excluded. 

Studies had to be available ia English, but otherwise no restrictions were placed on study location 
or year of publication. Every attempt was made to locate dissertations and other unpublished 
documents in addition to the published literature. 

Methodological requirements for inclusion. Criteria for incl»ision of studies in the main review 
were essentially identical to those used in an earlier review of elementary ability grouping (Slavin, 
1987). These were as follows: 

1. Ability-grouped classes wer;; compared to heterogeneousiy-grouped classes. This 
requirement excluded ? tew studies that correlated "degree of heterogeneity" with 
achievement gain (e Millman & Johnson, t964; Wilcox, 1963). Studies that compared 
achievement gain;* for students in different tracks (e.g., Alexander, Cook, Sc McDill, 1978) 
were exclude^ from the main review but are discussed in a separate .ejection. 

2. Achievement data from standardized or teacher-made tests were presented. This excluded 
many anecdotal reports and studies which used grades as the dependent measure. 
Teacher-made tests, used in a very small number of studies, were accepted only if there was 
evidence that they were designed to assess objectives taught in all classes. 

3. Initial comparability of samples was established by use of random assignment or matching 
of students or classes. When individual students in intact schools or classes were matched, 
evidence had to be presented that the intact groups were comparable. 

4. Ability grouping had to be in place for at least a semester, 

5. At least three ability-grouped and three control classes were involved. 

The criteria outlined above excluded very few studies comparing comprehensive ability grouping 
plans to heterogeneous placements. Every study located which satisfied criteria 1, 2, and 3 also 
satisfied criteria 4 and 5, Excluding studies of special programs for high achievers (e.g., Atkinson 
& O'Connor, 1963), all but two of the studies included in meta-analyses by Kulik and Kuiik (1982, 
1987) were also included in the present review. The exception w?s a study by Adamson (1971) 
which had substantial IQ dl^Terences favoring the ability-grouped school, and one by Wilcox (1963) 
which compared more and less heterogcneously tracked classes. 

One major category of studies included in the present review but excluded by the Kuliks are those 
which did not present data from which effect scores could be computed (e,g., Borg, 1965; Fern, 
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1971; Lovell, 1960; Postlethwaite & Denton, 1978). These studies are discussed in terms of the 
direction and statistical significance of their findings. 

Literature Search Procedures 

The studies included in this review were located in an extensive search. Principal sources included 
the Education Resources Information Center (ERIC), Dissertation Abstracts, and citations made 
in other reviews, meta-analyses, and primary sources. Every attempt was made to obtain a 
complete set of published and unpublished studies that met the criteria ouiline above. 

Computation of Effect Sizes 

Effect sizes were generally computed as the difference between the experimental and control 
means divided by the control group's standard deviation (Glass et al., 1981). In the ability grouping 
literature, the heterogeneous group is almost always considered the control group, and this 
convention is followed in the present article; positive effect sizes are ones that favored ability 
grouping, while negative effect sizes indicated higher means in the heterogeneous groups. The 
standard deviation of the heterogeneous group is also preferred as the denominator because ot tne 
possibility that ability grouping may alter the distribution of scores. However, when means or 
standard deviations were omitted in studies that otherwise met the inclusion criveria, effect sizes 
were estimated when possible from t*s, Fs, exact p values, sums of squares in factorial designs, or 
other information, following procedures described by Glass et al. (1981). 

Several of the studies included in this review presented data comparing gain scores without 
reporting actual pre- or posttest mearis. Standard deviations of gain scores are typically lower 
than those of raw scores (to the degree that pre-post correlatioris exceed + 0.5), so effect sizes 
computed on gain scores are often inflated If pre-post correlalions are known, effect sizes from 
all scores can be transformed to the scale of posttest values. However, because none of the studies 
using gain scores also provided pre-post correlations, a pre-post correlation of + 0.8 was assumed 
(following Slavin, 1987). Using a formula from Glass et al. (1981), this correlation produces a 
multiplier of 0.632, which was used to deflate effect size ef>timates from gain s^ore data. The 
purpose of this and other procedures was to attempt to pul all effect size estimates in the same 
metric, the unadjusted standard deviation of the heterogeneous classes. However, because this 
multiplier is only a rough approximation, effect sizes from studies using gain scores should be 
interpreted with even more caution than that which is wananted for effect size?; in general. 

Another deviation from usual meta-analytic procedure used in the present, review involved 
adjustments of posttest scores for any pretest differences. This was done either by subtracting 
pretest means from posttests (if the same tests were med), by converting pre- and posttest means 
to z-scores and then subtracting (if different test^ were used), or by using (X)variance-adjusted 
scores. However, even when such adjustments were made affecting the numerator of the effect 
size formula, the denominator remained the unadjusted posttest standard deviation. 

One effect size is reported for each study (see Bangert-Drowns, 1986). When multiple subsamples, 
subjects, or tests were used, medians were computed across the data points. For example, if four 
measures were used with three subgroups (e.g., high, middle, and low achievers), the effect size for 
the study as a whole would be the median of the tv/elve (4 x 3) resulting effect sizes. Whenever 
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possible, findings were also broken down by achievement level (high, average, low), and separate 
effect sizes were also computed for each major subject. 

In pooling findings across studies, medians rather than means were used, principally to avoid giving 
too much weight to outliers. However, any measure of central tendency in a meta- analysis or 
best-evidence synthesis should be interpreted in light of the quality and consistency of the studies 
from which it was derived, not as a finding in its own right. 

Research on Ability Grouping in Secondary Schools 

A total of 29 studies of tracking or streaming in secondary schools met the inclusion criteria listed 
earlier. The studies, their major characteristics, and their findings are listed in Table 1. 



Table 1 Here 



The studies listed in Table 1 are organized in three categories according to their research designs. 
Six studies used random assignment of students to ability-grouped or heterogeneous classes. Nine 
studies took groups of students, matched them individually on IQ, composite achievement, and 
other measures, and then assigned one of each matched pair of students to an ability-grouped class, 
one to a heterogeneous class. The quality of these randomized or matched experimental designs 
is very high, and the findings of the 15 studies using such designs must be given special weight. 
The remaining 14 studies investigated existing schools or classrooms which used or did not use 
ability grouping, and then either selected matched groups of students from within each type of 
school or used analyses of covariance or other statistical procedures to equate the groups. The 
difficulty inherent in such designs is that any differences between schools that are systematically 
related to ability grouping would be confounded with the practice of ability grouping per se. For 
example, a secondary school that used heterogeneous grouping might have a staff, principal, or 
community more concerned about equity, affective development, or other goals than would a 
"matched" school that used ability grouping. However, several of the correlational studies used very 
large samples and longitudinal designs, and these provide important additional information not 
obtainable from the typically smaller and shorter experimental studies. 

Within each category studies are listed in descending order of sample size. All other things being 
equal, therefore, studies near the top of Table 1 should be considered as better evidence of the 
effects of ability-grouping than studies near the end of the Table. However, the nature and quality 
of the studies are discussed in more detail in the following sections. 

Overall Findings 

Across the 29 studies listed in Table 1, the effects of ability grouping on student achievement are 
essentially zero. 
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The median effect size for the 20 studies from which effect si?^ could be estimated was -.02, and 
none of the nine additional studies found statisticai'v significant effects. Counting the studies with 
nonsignificant differences as though they had effect sizes of .00, the median effect size for all 29 
studies would be .00. Results from the 15 randomized and matched experimental studies were not 
much different; the median effect size was -.06 for the 13 studies from which effect sizes fxjuld be 
estimated. In nine of these thirteen studies (including all five of the randomized studies) results 
favored the heterogeneous groups, but these effects are mostly very small. 

There are few consistent patterns in the study findings. Most of the studies involved grades 7-9, 
with ninth graders sometimes in junior high schools and sometimes in senior high schools. No 
apparent trend is discemable within this range. Above the ninth grade the evidence is too sparse 
for firm conclusions. LoveU (1960) found that high achievers performed significantly better in 
ability-grouped English classes, but there were no effects in biology or algebra and no effects for 
average or low achievers. In a four-year study of students in grades 9-12, Borg (1965) found 
significant posiUve effects of ability grouping for average and low achievers in math, but found no 
differences in science or for high achievers. Cohorts followed from grades 7-10 and 8-11 showed 
no significant differences on any measure for any ability level. On the other hand, Thompson 
(1974), in a study of eleventh-grade social studies, found the largest effects favoring heterogeneous 
grouping of aU studies located (ES = -.48), whUe Kline (1964), in another four-year study of 
students in grades 9-12, found no differences. 

Twelve of the 29 studies tracked students for all subjects according to one composite ability or 
achievement measure. The remaining seventeen studies grouped on the basis of performance in 
one or more specific subjects. However, there were no differences in the outcomes of these 
different forms of ability grouping. In addition, there were no consistent patterns in terms of the 
number of ability groups to which students were assigned (the great majority of studies used thxee). 
Study duration had no apparent impact on outcome. Studies which used adjusted gain scores 
produced the same effects as other studies, and the use of the adjustment of gain scores described 
above made no difference ui outcomes. 

There was no discernible pattern of findings with respect to different subjects, with one possible 
exception. Studies by Marascuilo and McSweeney (1972), Thompson (1974), and Fowlkes(1931) 
found relatively strong effects favoring heterogeneous grouping in social studies, and three 
additional studies by Peterson (1966), Martin (1927), and Postlethwaite and Denton (1978) found 
no differences or slight effects in the same direction. This is not enough evidence to conclusively 
point to a positive effect of heterogeneous grouping in social studies, but it is important to note 
that all three of the randomized or matched experimental studies found differences in this direction. 

There were no consistent effects according to study location. All four of the British studies found 
no differences between streamed and unstreamed classes. A large, longitudinal Swedish study by 
&/ensson (1962), not shown in Table i because it lacked adequate evidence of initial equality, also 
found no differences between streamed and unstreamed classes. Urban, suburban, and rural schools 
had similar outcomes. The one study which involved large numbers of minority students, a 
randomized experiment in a New York City high school by Ford (1974), found no differences 
between ability-grouped and heterogeneous math classes. 
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Studies conducted before 1950 were no more likely than more recent studies to find achievement 
differences. On this topic, it is interesting to note that experimentai-coalrol studies of ability 
grouping have not been done in recent years. The only study of the 1980s, by Kerckhoff (1986), 
was done by a sociologist who focused his attention on differences between students in different 
streams. This study is described in more detail later on. Othenvise, ihe most recent 
experimental-control comparisons were done in the early 1970s. 

Differential Effects According to Achievement levels 

One of the most important questions about ability grouping in secondary schools concerns the 
degree to which it affects students at different achievement levels differently. As noted earlier, 
many researchers and reviewen, particularly those working in the sociological tradition, have 
emphasized the relative impact of grouping for different groups of students far mce than the 
average effect for all students. 

Twenty-one of the 29 studies presented in Table 1 presented data on the effects of ability grouping 
on students of different ability levels. Most studies broke their samples into three categories (high, 
average, and low achievers), but some used two or four categories. 

Across the 15 studies from which effect sizes could be computed, the median effect size was +.01 
for high achievers, -.08 for average achievers, and -.02 for low achievers. Effects of this size are 
indistinguishable from zero, and if all the nonsignificant differences found in studies from which 
effect sizes could not be computed are counted as effect sizes of .00, the median effect size for 
each level of student becomes .00* In addition, only one of seven studies from which effect sizes 
could not be computed (Lovell, 1960) found significantly positive effects of ability grouping for high 
achievers, and none of these studies found significant effects in either direction for average and 
low achievers. The randomized and matched experimental studies provided slightly more support 
for the idea that ability grouping has a differential effect; the median effects sizes for high, average, 
and low achievers were +.05, -.10, and -.06, respectively. It is interesting to note that the study 
by Borg (1965), which is often cited to support the differential effect of ability grouping on 
students of different ability levels, in fact provides very weak support for this phenomenon. Across 
two measures given to members of four-year cohorts which principally included secondary years, 
significant effects favoring ability grouping were found for high achievers in one out of eight 
comparisons, for average achievers in three out of eight, and for low achievers in one out of eight. 
Only in a cohort that went from grades 4 to 7 were there significant effects favoring heterogeneous 
grouping for low achievers. 

It might be expected that differential effects of track placement would build over time, and that 
longitudinal studies would show more of a differential impact than one-year studies. The one 
multi-year randomized study, by Marascuilo and McSweeney (1972), did find that over a two-year 
period, students in the top social studies classes gained slightly more than similar students in 
heterogeneous classes (ES=+J4), while middle (ES=-.37), and low (ES=-.43) groups gained 
significantly less than their ungrouped counterparts. However, across seven multi-year correlational 
studies of up to five years' duration, not one found a clear pattern of differential effects, 

A few studies provided additional information on differential impacts of ability grouping by 
investigating effects of grouping on high or low achievers only. For example, Torgelson (1963) 
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randomly assigned low-achieving students in grades 7-9 to homogeneous or heterogeneous classes. 
Across several performance measures, the median effect size was +.13 (non-significantly favoring 
ability grouping). Similarly, Borg and Prpich (1966) randomly assigned Ir.-yv-achieving tenth gradere 
to ability-grouped or heterogeneous English classes, and found that there were no differences in 
one cohort In a second cohort, differences favoring ability grouping on a writing measure were 
found, but there were no differences on eight other measures. 

Studies of ability grouping of high achievers are difficult to distinguish from studies of special 
programs for the gifted. Weil-designed studies of programs for the gifted generally find few effects 
of separate programs for high achievers unless the programs include acceleration (exposure to 
material usuaUy taught at a higher grade level) (Fox, 1979; Kulik & Kulik, 1984). That is, grouping 
per se has little effect on the achievement of high achievers. An outstanding study that illustrates 
this is a dissertation by Mikkelson (1962), who randomly assigned high-achieving seventh and eighth 
graders to ability-grouped or heterogeneous math classes. The seventh-grade homogeneous classes 
were given enrichment, but the eighth graders were accelerated, skipping to ninth-grade algebra. 
No effects were found for the seventh graders. The accelerated eighth graders of course did 
substantially better than similar students who were not accelerated on an algebra test, and they did 
no worse on a test of eighth grade math. 

Taken together, research comparing ability-grouped to heterogeneous placements provides little 
support for the proposition that high achievers gain from grouping while low-achievers lose. 
Kuwever, there is an important limitation to this conclusion. In most of the studies which 
compared tracked to untracked grouping plans (including all of the randomized and matched 
experimental studies), tracked students took different levels of the same courses (e.g., high, average, 
or low sections of Algebra 1). Yet much of the practical impact of tracking, particularly at the 
senior high school level, is on determining the nature and number of courses taken in a given area. 
The experimental studies do not compare students in Algebra 1 to those in Math 9, or students 
who take four years of math to those who take two. The conclusioas drawn in this section are 
limited, therefore, to the effects of between-class grouping within the same courses, and should not 
be read as indicating a lack of differential effects of tracking as it affects course selection and 
course requirements. 

Other Forms of Ability Grouping 

The studies discussed above and summarized in Table 1 evaluated the most common forms of 
ability grouping in secondary schools; full-time, between-class ability grouping for one or more 
subjects. However, a few studies have evaluated other grouping plans. 

The most widely used form of grouping in elementary schools, within-class ability grouping, has 
also been evaluated in a few studies involving middle and junior high schools. Campbell (1965) 
compared the use of three math groups within the class to heterogeneous assignment in two 
Kansas City junior high schools. There were no differences between the two programs in 
achievement Harrah (1956) compared five types of within-class grouping in grades 7-9 in West 
Virginia, and found ability grouping to be no more successful than other grouping methods. Note 
that these findings conflict with those of studies of within-class ability grouping in mathematics in 
the upper elementary grades, which tended to support the use of math groups (Siavin, 1987). 
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Vakos (1969) evaluated the use of a combination of heterogeneous and homogeneous instruction 
in eleventh-grade social studies classes in MiMeapoiis. Students were grouped by ability two days 
each week^ but heterogeneously grouped the other three days. No achievement differences were 
found. Zweibelson, Bahnmuller, & Lyman (1955) evaluated a similar mixed approach to teaching 
ninth-grade social studies in New Rochelle, New York, and also found no achievement differences. 
Chiotti (1961) compared a flexible plan for grouping junior high school students across grade lines 
for mathematics to both ability-grouped and heterogeneous grouping plans, and again found no 
differences in achievement A cross-grade grouping arrangement similar to the Joplin Plan (Slavin, 
1987) was compared to within-class grouping in reading by Chismar (1971) in grades 4-8. 
Significantly positive effects of this program were found in grades 4 and 7 but not 5,6, and 8. 

Reconciling Track/No Track and High Track/Low Track Studies 

As noted earlier in this review, two very different traditions of research have dominated research 
on ability grouping. One involves comparisons of ability-grouped to heterogeneous placements. 
The other involves comparisons of the progress made by students in different ability groups or 
tracks. While there has been little experimental research comparing ability-grouped to 
heterogeneous placements since the early 1970s, research comparing the achievement of students 
in different tracks largely began in the 1970s and continues to the present. 

The findings of high track/low track studies of ability grouping conflict with those emphasized in 
this review, in that they generally find that even after controlling for IQ, socioeconomic status, 
pretest 3, and other measuies, students in high tracks gain significantly more in achievement than 
do students in low tracks, especially in mathematics (see Gamoran & Berends, 1987, for a review). 
How can these findings be reconciled with those of the experimental studies? 

One important difference between experimental and correlational studies of ability grouping is 
that, as mentioned earlier, correlational studies (especially at the senior high school level) often 
include not only the effects of being in a high, average, or low class, but also the effects of 
differential course-taking. Students in academic tracks may score better than those in general or 
vocational classes because they take more courses or more advanced courses. The experimental 
studies comparing grouped and ungrouped classes are all studies of grouping per se, holding 
course-taking and other factors constant^ while the conelational studies examine tracking as it is 
in practice, where track placement implies differences in coui-se requirements, course-taking 
patterns, and so on. Also, experimental track vs. no track studies are rare beyond the ninth grade, 
while most correlational studies comparing students in high vs. low tracks involve senior high 
schools. The lack of track vs. no track studies at the senior high school level is hardly surprising 
given the nearly universal use of some form of tracking at that level. However, tracking usually 
has a different meaning in senior than in junior high school. While junior high school tracking 
mostly involves different levels of courses (e.g. high English vs. low English), senior high tracking 
is more likely to involve completely different patterns of cousework (e.g., metal shop vs. French 
in). Also, the problem of dropouts becomes serious in senior high school; a study of twelfth 
graders unavoidably excludes the :>tudents who may have suffered most from being in the low track 
and left school (see Gamoran, 1987). This could reduce observed differences between high- and 
low-track students. 
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There is limited e\adence, however, that differences in course-taking or grade level account for 
the different conclusions of the irack/no track and high track/low track studies. Four-year 
longitudinal studies in U.S. senior high schools by Kline (1964) and Borg (1965) found no 
differential effects of track placement for high, average, and low achievers (as compared to similar 
students in untracked placements)- Presumably, course-taking patterns in these senior high school 
studies varied by track. A correlational study by Alexander and Cook (1982) found that while 
taking more courses in senior high school did increase achievement (net of background factors), 
different course-taking patterns in different tracks did not account for track differences in 
achievement Gamoran (1987) found that track effects on math and science achievement were 
explained in part by the fact that students m the academic tracks take more math and science 
courses and, in particular, more advanced courses in these areas. However, no such patterns were 
seen on reading, vocabulary, writing, or civics achievement measures. Gamoran notes the difficulty 
of disentangling irack and course-taking, which are highly correlated in math and science (and of 
course both track and course-taking are strongly correlated with ability, socioeconomic status, and 
other factors). It is certainly logical to expect correlational studies of senior high school tracking 
to find different effects of different track placements because of different course-taking patterns, 
but because of confounding of tracking, course-taking, and student background factors, this is 
difficult to determine conclusively. 

Another likely explanation for different findings of track/no track and high track/low track studies 
has to do with the difficalty of statistically controlling for large differences. Students in higher 
tracks tend to achieve at much higher levels than those in lower tracks (both before and after 
taking secondary courses), and statistically controlling for these differences is probably not enough 
to completely remove the mfluence of ability or prior performance on later achievement. Further, 
students in higher tracks are also likely to be hi^er in such attributes as motivation, internal locus 
of control, academic self-esteem, and effort; factors which are not likely to be controlled in 
correlational studies, measures of prior ability and achievement 

To understand the difficulty of controlling for large initial differences between students, imagine 
an experiment in which a new mstructional method was to be evaluated. The experimenter selects 
a group of students who have high test scores and high IQ scores, and are nominated by their 
teachers as being hard-working, motivated, and college material. This group becomes the 
experimental group, and the remaining students serve as the control group. To control for the 
differences between the groups, prior composite achievement and socioeconomic status are used 
as covariates or control variables. 

In such an experiment, no one would doubt that regardless of the true effectiveness of the 
innovative treatment, the experimental group would score far better than the control group, even 
controlling for prior achievement and socioeconomic status. No journal, or dissertation committee 
would accept such a study. Yet this "experiment" is essentially what is being done when researchers 
compare students in different tracks. When there are significant pretest differences, use of 
statistical controls through analysis of covariance or regression are considered inadequate to equate 
the groups. Most often, the statistical controls will undercontrol for true differences (Lord, 1960; 
Reichardt, 1979), Yet high- and low-track studei.*^ usually differ in pretests or IQ by one to two 
standard deviations, an enormous systematic difference for which no statistical procedure can 
adequately control. 
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The only study which compared both tracked to untracked schools and high-track to low-track 
students was a five-year longitudinal study by KerckhofF (1986) in Britain. This study illustrates 
the problem of controlling for large differences. For ©cample, in mathematics, boys in the high 
track of 3-group ability-grouping programs gained about 11 z-score points from a test given at age 
11 to one given at age 16, while students in a remedial track gained IS z-score points. Yet the 
regression coefficient comparing the high track to ungrouped students was + Z34, indicating 
performance about 42% of a standard deviation above "predicted" performance. In contrast, the 
remedial-track boys had a regression coefficient (in comparison to ungrouped students) of -,72, 
indicating performance about 13% of a standard deviation below "predicted" performance, despite 
the fact that the remedial students actually gained more than the top- track students. The reason 
for ihis is that the remedial students started out (at age 11) scoring 1.64 standard deviations below 
the ungrouped students, while top-track students started out 1.02 standard deviations above the 
ungrouped students, a total difference between top-track and remedial students at Z66. No 
regression or analysis of covariance can adequately control for such large pretest differences. 
Because of unreliability in the measures and less than perfect within-group correlations of pre- and 
posttests, "predicted** scores based on pretests and other covariates will (other things being equal) 
be too low for high achievers and too high for low achievers. 

Another factor that can contribute to overestimates of the effects of curriculum track on 
achievement in studies lacking heterogeneous comparison groups is fan spread. Put simply, high 
achievers usually gain more per year than do low achievers, so over time the gap between high 
and low achievers grows. This increasing gap cannot be unambiguously ascribed to ability grouping 
or other school practices, as it occurs under virtually all circumstances, A student who is 
performing at the 16th percentile in the sixth grade and is still at the 16th percentile in twelfth- 
grade will be further "behind" the twelfth-grade mean in grade equivalents, for example (Coleman 
& Karweit, 1972), 

An additional factor that can contribute to spurious findings indicating a benefit of being in the 
high track is that factors other than test scores influence placement decisions. For example, a study 
by Balow (1964) found that on math tests not used for group placement, there was enormous 
overlap between students in supposedly homogeneous seventh-grade math classes. More than 72% 
of the students scored between the lowest score in the top group and the highest score in the 
bottom group. Among these students in the ''area of overiap," students who were in the top group 
gained the most in math achievement over the course of the year, while those in the low group 
gained the least 

On its surface this study provides support to the "self- fulfilling prophecy" argument. Yet consider 
what is going on. Imagine two students with identical scores, one assigned to the high group and 
one to the low group. Why were they so assigned? Random error is a possibility, but all the 
systematic possibilities weigh in the direction of higher performance for the student assigned to the 
high group. Since teacher judgment was involved, teachers may have accurate knowledge to enable 
them to predict who will do well and who will not. The actual assignments were done on different 
tests than those used in the Balow study; it is likely that students who scored low on Balow's 
pretests but were put in the high groups scored high on the test used for placement, and then 
regressed to a higher mean on Balow's posttest. 




What this discussion is meant to convey is not that different tracks do or do not have a differential 
impact on student achievement, but that comparisons of students in existing tracks cannot tell us 
one way or another. To learn about the differential impacts of track pla^ment, there are two 
types of research that might be done. One would be to randomly assign students at the margin 
to different tracks, something that has never been done. The other is to compare similar students 
randomly assigned ability-grouped or ungrouped systems. This has been doiie several times, and, 
as noted earlier in this review, there is no clear trend indicating that students in high-track clV^ 
leara any more than high-achieving students in heterogeneous classes, or that students in low-track 
classes learn any less than low-achieving students in heterogeneous classes. 

Why is Ability Grouping InelTective? 

The evidence summarized in Table 1 and discussed in this review is generally consistent with the 
conclusions of eariier reviews comparing homogeneous and heterogeneous grouping (e.g., Kulik & 
Kulik, 1982, 1987; Noland, 1985), but runs counter to two quite different kinds of "common sense." 
On one hand, it is surprising to find that assignment to the low ability group is not detrimental to 
student learning. A substantial literature has indicated the low quality of instruction in low groups 
(e.g., Evertson, 1982; Gamoran, 1989; Oakes, 1985), and a related body of research has documented 
the negative impact of ability grouping on the motivations and self-esteems of students assigned 
to low groups (e.g., Coule, 1974; Schafer & Olexa, 1971; Trimble & Sinclair, 1987). How can the 
effect of ability grouping on low-achieving students be zeio, as this review concludes? 

On the other hand, another kind of "common sense" would argue that, at least in certain subjects, 
ability grouping is imperative in secondary schools. How could an eighth-grade math teacher teach 
a class composed of students who are fully eady for algebra and students who are still not firm 
in subtraction and multiplication? How does an English teacher teach literature and writing to a 
class in which reading levels range from third to twelfth grade? Yet study after study, including 
randomized experiments of a quality rarely seen in educational research, finds no positive effect 
of ability grouping in any subject or at any grade level, even for the high achievers most widely 
assumed to benefit from grouping. 

The present review cannot provide definitive answers to these questions. However, it is worthwhile 
to speculate on them. 

One possibility is that the standardized tests used in virtually all the studies discus.sed in this review 
are too insensitive to pick up effects of grouping. This seems particularly plausible in looking at 
tests of reading, because reading has not general'y been taught as such in seconGiity schools. 
However, standardized tests of mathematics do have a great deal of face validity and curricuiar 
relevance, and these show no more consistent a pattern of outcomes. Marasculio & McSweeney 
(1972) used both teacher-made and standardized measures of social studies achievement and found 
similar results with each. 

Another possibility is that it simply does not matter whom students sit next to in a secondary class. 
Secondary teachers use a very narrow range of teaching methods, overwhelmingly using some form 
of lecture/discussion (Goodlad, 1983). In this setting, the direct impact of students on one another 
may be minimal. If this is so, then any impacts of ability grouping on students would have to be 
mediated by teacher characteristics or behaviors or by student perceptions and motivations. 
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Studies contrasting teaching behaviors in high- and low-track classes usually find that the low tracks 
have a slower pace of instruction and lower time on-task (e.g, Evertson, 1982; Oakes, 1982). Yet, 
as noted eailier, the meaning and impact of these differences are not self-evident It may be that 
a slower pace of instruction is appropriate with lower-achieving students, or that pace is relatively 
unimportant because a higher pace with lower mastery is essentially equivalent to a lower pace with 
higher mastery. Higher time on-tssk should certainly be related to higher achievement (Brophy 
& Good, 1986), but the comparisons of time on task between high and low tracks are misleading. 
What would be important to compare is time on task for low achievers in homogeneous and 
heterogeneous classes, because low achievers may simply be off-task more than high achievers 
regardless of their class placement In this regard, it is important to note that Evertson, Sanford, 
& Emmer (1981) found time on-task to be lower in extremely heterogeneous junior high school 
classes than in less heterogeneous ones because teachers had difficulty managing the more 
heterogeneous classes. 

The lesson to be drawn from research on ability grouping may be that unless teaching methods are 
systematically changed, school organization has little impact on student achievement This 
conclusion would be consistent with the equally puzzling finding that substantial reductions in class 
size have little impact on achievement (Slavin, 1989); if teachers continue to use some form of 
lecture/discussion/seatwork/quiz, then it may matter very little in the aggregate which students the 
teachers are facing or how many of them there are. In contrast, forms of ability grouping which 
were found to make a difference in the upper elementary grades, the Joplin Plan (cross-grade 
grouping in reading to allow for whole-class instruction) and within-class grouping in mathematics 
(Slavin, 1987) both significantly change time allocations and instructional activities within the class- 
room. 

Alternatives to Ability Grouping 

If the effects of ability grouping on student achievement are zero, then there is little reason to 
maintain the practice. As noted earlier in this article, arguments in favor of ability grouping 
depend on assumptions about the effectiveness of grouping, at least for high achievers. In the 
absence of any evidence of effectiveness, these arguments cannot be sustained. 

Yet there is also no evidence that simply moving away fi'om traditional ability grouping practices 
will in itself enhance student achievement, and there are legitimate concerns expressed by teachers 
and others about the practical difficulties of teaching extremely heterogeneous classes at the 
secondary level. How can schools moving away from traditional ability grouping use this 
opportunity to contribute to student achievement? 

One alternative to ability grouping often proposed (e.g, Oakcs, 1985) is use of cooperative learning 
methods, which involve students working in small, heterogeneous learning groups. Research on 
cooperative learning consistently finds positive effects of these methods if they incorporate two 
major elements: group goals and individual accountability (Slavin, 1990). That is, the cooperating 
groups must be rewarded or recognized based on the sum or average of individual learning 
performances. Cooperative learning methods of this kind have been successfully used at all grade 
levels, but there is less research on them in grades 10-12 than in grades 2-9 (see Newmann & 
Thompson, 1987). Cooperative learning methods have also had consistently positive impacts on 
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such outcomes as self-esteem, race relations, acceptance of inainstreamed academically handicapped 
students, and abiiity to work cooperatively (Slavin, 1990). 

One category of cooperative learning methods may be particularly useful in middle schools moving 
toward heterogeneous class assignments. These are Cooperative Integrated Reading and 
Composition (Stevens, Madden, Slavin, & Famish, 1987) and Team Assisted Individualization - 
Mathematics (Slavin, Madden, & Leavey, 1984; Slavin & Karweit, 1985). Both of these methods 
are designed to accommodate a wide range of student performance levels in one classroom, using 
both homogeneous and heterogeneous within-class groupin:gs. These programs have been 
successfully researched in grades 3-6, but are often used up to the eighth grade level. 

Other alternatives to between-class ability grouping have also been found to be successful in the 
upper elementary grades (see Slavin, 1987) and could probably be effective in middle schools as 
well These include within-class ability grouping in mathematics (e.g., teaching two or three math 
groups within a heterogeneous class), and the Joplin Plan in reading. The Joplm Plan involves 
regrouping students for reading across grade levels but according to reading level, so that no 
within-class reading groups are necessary. However, while these alternatives to between-class 
grouping are promising because of their success in the upper elementary grades, the few studies 
of within-class ability grouping at the junior high school level have not found this practice to be 
effective (Campbell, 1965; Harrah, 1956) and the one middle school study of the Joplin Plan found 
only inconsistent positive effects (Chismar, 1971). 

For descriptions of secondary schools implementing alternatives to traditional ability grouping, see 
Slavin, Braddock, Hall, & Petza, 1989. 

Limitations of This Review 

It is important to note several limitations of the present review. Perhaps the mast important is 
that in none of the studies reviewed here were there systematic observations made of teaching and 
learning. Observational studies and outcome studies have proceeded on parallel tracks; it would 
be important to be able to relate evidence of outcomes to changes in teacher behaviors or 
classroom characteristics. Another limitation, mentioned earlier, is that almost all studies reviewed 
here used standardized tests of unknown relationship to what was actually taught A third 
limitation is the age of most of the studies reviewed. It is possible that schools, students, or ability 
grouping have changed enough since the 1960s or J 970s to make conclusions from these and older 
studies tenuous. 

As noted earlier, the results reported in this review mainly concern the effects of grouping per se, 
with little regard for the effects of tracking on such factors as course-taking. Effects of tracking 
on differential course-taking are most important in senior high schools. There is a need for 
additional research comparing tracked to untracked situations at the senior high school level, 
particularly research designed to disentangle the effects of tracking from those of differential 
course-tracking. 

In addition, it would add greatly to the understanding of ability grouping in secondary schools to 
have evaluations or even descriptions of a wider range of alternatives to traditional ability grouping. 
The few studies of within-class grouping, cross-grade grouping, and flexible grouping plam are not 
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nearly adequate to explore akcrnatiyes. Cooperative learning, often proposed as an alternative to 
ability grouping, has frequently been found to increase student achievement in ability grouped as 
well as ungrouped secondary classes (Slavin, 1990; Ne^^mann & Thompson, 1987), but no study has 
yet compared cooperative learning in heterogeneous classes to traditional iastruction in 
homogeneous ones. De^icriptions of creative alternatives to ability grouping currently exist only at 
the anecdotal level (Slavin, Braddock, Hall, & Petza, 1989). 

G)nclusions 

While there are limitations to the scope of this review and to the studies on which it is based, 
there are several conclusions that can be advanced with some confidence. These are as follows: 

1. Comprehensive between-class ability grouping plans have little or no effect on the 
achievement of secondary students. This conclusion is most strongly supported in grades 
7-9, but the more limited evidence that does exist from studies in grades 10-12 also fails to 
support any effect of ability grouping. 

2. Different forms of ability grouping are equally ineffective. 

3. Ability grouping is equally ineffective in all subjects, accept that there may be a negative 
effect of ability grouping in social studies. 

4 Assigning studerts to different levels of the same c urse has no consistent positive or 
negative etTects on students of high, average, or low ability. 

For the narrow but extremely important purpose of determining the impact of ability grouping on 
standardized achievement measures, the studies reviewed here are exemp'^y. Six of them randomly 
assigned individual students to ability-grouped or heterogeneous classes, and nine more individually 
matched students and then assigned them to one or the other grouping plan. Many of the studies 
foUowed students for two or more years. If there were any true effect of ability grouping on 
student achievement, this set of studies would surely have detected it 

For practitioners, the findings summarized above mean that decisions about whether or not to 
group by abiity must be made on bases other than likely impacts on achievement. Given the 
antidemocratic, antiegalitarian nature of ability grouping, the burden of proof should be on those 
who would group rather than those who favor heterogeneous grouping, and in the absence of 
evidence that grouping is beneficial, it is hard to justify continuation of the practice. The possibility 
that students in the low groups are at risk for delinquency, dropping out, and other social problems 
(e.g., Rosenbaum, 1980) should also weigh against the use of ability grouping. Yet schools and 
districts moving toward heterogeneous grouping have little basis for expecting that abolishing ability 
grouping will in itself significantly accelerate stude:it achievement unless they also undertake 
changes in curriculum or instruction likely to improve actual teaching. 

There is much research still to be done to understand the effects of ability grouping in secondary 
schools on student achievement. Studies of grouping at grades 10-12, studies of a broader range 
of alternatives to grouping, and studies relating observations to outcomes of grouping are areas of 
particular need. Enough research has been done comparing tracked to heterogeneous classes and 
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achievement in higii, middle, and low tracks, at least up through the ninth grade. It is time to 
move beyond these simple comparisons to consider more fully how secondary schools can adapt 
instraction to the needs of a heterogeneous student body. 
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LO 
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3-group AG or hetero "core" 
classes. Both classes taught 
by same teacher. Iowa TesU of 
Basic Skilb used as potttesU. 
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LO 
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classes. StudenU compared on 
Metropolitan Achievement Test. 
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AV = Average Achieving StudenU 
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standardized algebra measure. 
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made tests. 
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-.*«> 
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1>T 
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4-group AG or hetero English 
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California Achievement Test 
gains. 


HI+.22 
HI AV-.03 
LO AV-.13 

LO-.20 
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Language 


+.06 
-.13 


-.04 


Wilcutt, 
1969 


7 


Bloomington, 
IN 

(lab school) 
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lyr. 


Matched students assigned to 
4'group flexible AG in math or 
to hetero. Grouping changed 

8 (inm In the vf*.ar 
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-.15 
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Holy & Sutton, 
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9 
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OH 


148 

Stud-ints 


1 sem. 


Matched students assigned to AG, 
hetero algebra classes. Same 
tracher taught all classes. 
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Martin, 
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New Haven, 
CT 


83 

Students 


lyr. 


Matched students assigned to 
3-group AG or hetero. 


HI 
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LO 
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+.23 


Reading +.17 
Math -h.n 
Language +.03 
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Social 
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Britain 



8,509 
Students 



5 yn. 



Fbgelman, Essen, Sc 6*10 
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Britain 



5,923 
Studenu 



4 yrs. 



Borg. 
1965 



6- 9 

7- 10 

8- U 

9- 12 



Utah 



2,934 
Students 



4 yrs. 



Fcrri, 
1971 



5-6 



Britain 



28 Schools 

1,716 

Students 



2yr». 



Breidenstine, 
1936 



7-9 



Soudenburg, 
PA 



11 Schools 
860 

Students 



lyr. 



Purdom, 
1929 



700 

Students 



1 sem. 



Postlethwaite & 

Denton, 

1978; 

Ne\vt)o!d, 1977 



5-7 



Britain 
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450 

Students 



2 yrs. 



Bachman, 
1968 



Portland, 
OR 



15 Schools 
23 Teachers 
404 Students 



lyr. 



Kline, 
1964 



9-12 



St Louts, 
MO 



4 Schools 



4 yrs. 



ERLC 



34 



EScct Sto 

By 

Sut^ 



Longitudinal study of students 
throughout Britain v/ho attended 
streamed or unslreamcd secondacy 
schools. 



Reading +.02 
Math +.03 



+.03 



Retrospective study compared 
students who had been in streamed, 
partially streamed, jr heterogeneous 
schools throughout secondary school, 
controlling for grade 5 general ability. 

Longitudinal study of students in 
districts using AO compared to 
students in neighboring district 
us<ng heterogeneous grouping, 
controlling for pretests. 

Streamed and non-streamed schoob 
mstched on 7+ (grade 2) reading, 
followed 4 yean in junior school, 
2 years in secondary. 

G)mpared students in 4 AG, 
7 hetero schools matched on IQ. 



HI (0) 
AV (0) 
LO (0) 



HI (0) 
AV (0) 
LO (0) 



Reading +.02 
Math +.03 



Math (0) 
Science (0) 



+.03 



Math 
English 



(0) 
(0) 



Composite 
Achievement 



(0) 



(0) 



.19 



Matched students in AO, hetero 


HI 


-.02 


English 


-.02 




English and algebra classes 


AV 


-.08 


Algebra 


.00 


i 01 


C'liipared in achievement. 


LO 


+.07 








Students within one secondary 


HI 


(0) 


Math 


(0) 


(0) 


school assigned to streamed 


AV 


(0) 


Englbh 


(0) 




or unstreamed halls. Achievement 


LO 


(0) 


Social 






assessed on national examinations. 






Studies 


(0) 










French 


(0) 




Math classes in schools using AG 






Math 


(0) 


(0) 


comparul to hetero classes. 












controlling for IQ. 












Retrospective study of successive 


V.HI 


-.02 


Reading 


-.05 


+.01 


cohorts of students, one in 3- or 


HI 


+.08 


Language +.07 




4'group AG, one hetero, in 4 


AV 


.00 


Math 


+.01 




schools. Compared on standardized 


LO 


-.02 









tests after 4 years of AG or hetero 
placement. 
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Stoakes, 7 Cedar Rapids, 3 Schools 1 yr. 

1964 lA 



Martin, 6-8 Nashville, 3 Schoob 2 yrs. 

1959 TN 



Chiotd, 9 Issaquah, 3 Schools 1 yr. 

1961 WA 



Fowlkcs, 7 Gbndab, 2 Schools 1 scm. 
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Cochraiie, 8 Kalamazoo, 1 School 1 yr. 

1961 MI 
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Effect Sizes 
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IbCal 



Matched mentally advanced and 


HI 


(0) 


Reading 


(0) 


tknv-leaming students compared 


LO 


(0) 


English 


(0) 


In enKrw^W iMino ACt nr h^tpro 






M^^h 


(0) 


assignment Compared on 










sUndardized tests. 










Retrospective study compared 


HI 


(0) 


Reading 


(0) 


gains on Stanford Achievement 


AV 


(0) 


Language 


(0) 


TesU for 2 AO and 1 hetero 


LO 


(0) 


Math 


(0) 


school from grades 6^ 










Matched students in 3-group 


HI 


+.14 


Meth 


+.18 


AO and hetero schools compared 


AV 


+.06 






in math achievement. 


LO 


+35 






StudenU ifi school using 3-gi^up 


HI 


-.45 


Reading 


-.04 


AO based on IQ matched with 


AV 


-.18 


Language 


-.17 


students in hetero school 


LO 


-.05 


Math 


-.17 


Compared gains on Stanford 






Sodal 




Achievement Tests. 






Studies 


-.21 


Compared students grjuped 


HI 


(0) 


Math 


(0) 


separately for Englbh, math. 


AV 


(0) 


English 


(0) 


to previous year (hetero) 


LO 


(0) 







(0) 



(0) 



+.18 



.20 



(0) 



StudenU matched in IQ, age, 
sex, sch. 
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